Observational Research

PSCI 2270 - Week 10

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

November 9, 2023

Plan for this week



  1. Project Updates

  2. What if we cannopt randomize?

  3. Instrumental Variables: Colonial origins

  4. Regression Discontinuity: Finding close elections

  5. Differences-in-differences: Tabloid meida in UK

Any project updates?

What if we cannopt randomize?

Recap on experiments


  • Key idea: Randomization of the treatment makes the treatment and control groups “identical” on average

  • The two groups are expected to be similar in terms of all characteristics (both observed and unobserved)

    • Control group is similar to treatment group
    • Outcome in control group \(\approx\) what would have happened to treatment group if they did not receive treatment
    • vice versa
  • If we want to study effects of factor \(X\) on \(Y\) we would ideally want to run an experiment

    • But what do we do if we cannot randomize?
    • E.g. we do not have funding or phenomena we study, like elections, cannot be reasonably manipulated by researcher

Second-best: “Natural” experiments


  • Natural experiments: Effects of random or as-if random events/processes that occur outside of researcher’s control and are related to factors we are interested in
  • Uses observational data, but is better than correlation

    • Researchers can claim that “treatment” assignment is not related to other factors that can explain outcomes \(\Rightarrow\) Solve the issue of confounding!
  • What is the difference between random and as-if random?

    • Random: We can prove that the process/event was decided by lottery or analogous procedure and know the probabilities
    • As-if random: Process/event is not random, and we know it, but are unrelated (under assumptions) to other factors that are linked to our outcomes

As-if random example

  • John Snow and the study of cholera in London in 1854 (!)

Study



  • Theories on cholera have two hypotheses of transmission: water or air

    • How would we test this experimentally?
  • As-if random event:

    • In one area of London where outbreak happened in 1854 two companies supply water
    • In 1852 Lambeth moved upstream \(\Rightarrow\) possibly cleaner water
    • John Snow collected data on cholera deaths over 7 weeks

Results

Deaths from cholera epidemic over 7 weeks, 1854
Water Supply Company Number of Houses Deaths From Cholera Cholera Deaths per 10,000 Houses
Southwark and Vauxhall 40,046 1,263 315
Lambeth 26,107 98 37
Rest of London 256,423 1,422 59


  • Evidence: Strong support for the water hypothesis!
  • Was the exposure actually random?

    • Did companies choose strategically? No
    • Did people choose strategically? No
    • Are areas served by the company upstream different? No

How to find natural experiments



  • Random: Lotteries related to politically relevant processes; Timing of events

  • As-if random: Borders and historical events

  • Process of looking for natural experiments is always creative, but there are some commonly used types
  • Let’s brainstorm this in groups!

From Dunning (2012)

Making as-if random


  • As-if random events often need to use adjustments to standard difference-in-means analyses
  • Regression with covariates: Control for possible confounders \(\Leftarrow\) Football affects elections
  • Instrumental variables: Use before and after treatment comparison within the same unit \(\Leftarrow\) Colonial origins
  • Regression Discontinuity: Use some naturally occurring discontinuity (e.g. taxation) to compare units around it \(\Leftarrow\) Extremists win primaries
  • Differences-in-Differences: Compare trends between treated and untreated units even if we know there might be differences between them \(\Leftarrow\) Tabloid media in UK

Colonial origins of Comparative Development

Colonial origins


  • “The Colonial Origins of Comparative Development: An Empirical Investigation” Acemoglu, Johnson, and Robinson (2001) (17566 citations!)
  • Summary:

    • Observational study on the long-term effects of colonial institutions on country development
    • Heavy use of historical data
    • Institutions are not randomly assigned! Deal with this using instrumental variables
    • Colonial institutions persist over time and affect the income per capita today!

Theory



  • Question: Why are some countries rich and some are poor? (\(Y\))

  • Hypothesis: Extractive institutions (\(X\)) are persistent and can affect long-term development

  • Searching for as-if random assignment:

    • Why countries have different institutions?
    • Some countries were forced into extractive institutions by colonialists
    • Are there as-if random reasons for setting extractive institutions?

Instrumental variables


  • What is an instrument: Some factor that affects our main independent variable (early institutions) but is unlikely to affect our main outcome (current development) directly

    • \(Z \rightarrow X \rightarrow Y\), but \(Z \not\rightarrow Y\)
    • Why the last part is important? Excludabiolity!
    • What is the instrument in Acemoglu, Johnson, and Robinson (2001)
  • Instrument in Acemoglu, Johnson, and Robinson (2001):

    • (potential) settler mortality \(\Rightarrow\) settlements \(\Rightarrow\) early institutions \(\Rightarrow\) current institutions \(\Rightarrow\) current economic development

How do we use instruments


  • Instruments affects outcome through independent variable \(\Rightarrow\) Look only at the effect of independent variable on outcomes predicted by instrument
  • Instead of simple correlation we use two-stage procedure (2SLS)

    1. Predict the current instutitions using historical data on mortality/settlements
    2. Look at correlation between predicted institutions and economic development
  • To prove that instrument is valid researcher needs:

    • Show that instrument indeed predicts independent variable
    • Provide evidence that instrument is unlikely to affect outcome directly

How do they do it?



  • How do they operationalize economic development (\(Y\)), institutions (\(X\)) and settlements/mortality (\(Z\))?
  • How do they prove that settlements did not affect economic development through other channels?
  • What analyses do they run?

Main results

  • (potential) settler mortality/settlements \(\Rightarrow\) early institutions \(\Rightarrow\) current institutions

Main results

  • (potential) settler mortality/settlements \(\Rightarrow\) current institutions \(\Rightarrow\) current economic development

Main critiques


  • Different sources of data (some predicted mortality, some direct measures) \(\Rightarrow\) Selection bias

  • Data on troops (some in barracks and some on campaign) \(\Rightarrow\) Selection bias

  • Why use data on troops at all if it is not the same as settlers \(\Rightarrow\) Measurement validity

  • Can settlers affect current levels of development not through institutions? \(\Rightarrow\) Violation of excludability

  • \(\Rightarrow\) No Nobel Prize 😢

Final projects

Final project structure


  1. Introduction (2 pages)

    • Motivation/Background \(\Rightarrow\) Literature review \(\Rightarrow\) Research question \(\Rightarrow\) Brief description of study and how it fits in the literature
  2. Theory and Hypotheses (1-2 pages)

    • (Broad) Theoretical expectations \(\Rightarrow\) (Concrete) Hypotheses you are planning to test
  3. Research design (5-7 pages)

    • Setting and context \(\Rightarrow\) Description of your sample \(\Rightarrow\) What is your “treatment” \(\Rightarrow\) Independent/dependent variables and \(\Rightarrow\) Measurement \(\Rightarrow\) Estimation
  4. Possible Issues (1-2 pages)

  5. Appendix (😵‍💫 pages)

    • Code for simulation of data and simple analyses
    • Survey instruments and other details on measurement

Back to observational stuff

Making as-if random


  • “As-if” random events often need to use adjustments to standard difference-in-means analyses

  • Regression with covariates: Control for possible confounders \(\Leftarrow\) Football affects elections

  • Instrumental variables: Use before and after treatment comparison within the same unit \(\Leftarrow\) Colonial origins

  • Regression Discontinuity (RDD): Use some naturally occurring discontinuity (e.g. taxation, border, election, football match win) to compare units around it \(\Leftarrow\) Extremists win primaries

  • Differences-in-Differences (DiD): Compare trends between treated and untreated units even if we know there might be differences between them \(\Leftarrow\) Tabloid media in UK

Discontinuity and DiD


  • Old problem: We need to find units that vary in treatment but are very similar on all other dimensions
  • New solutions:

    • RDD: Zoom in onto those who almost got “treatment” (control group) and those who just made it (treatment group)
    • DiD: Utilize some naturally occuring event that splits the sample into two groups and assume that if “treated” units would not receive “treatment” they would behave like “control” units

Spot it

  • Can you guess which plot is characteristic of which design?

  • \(\Leftarrow\) RDD design: “forcing variable” on X axis and outcome of interest on Y axis

  • \(\Rightarrow\) DiD design: trends in outcomes between “treated” and “control” units before and after the event (and also projects what the “treated” units “untreated” potential outcomes would be)

  • Try to guess what are the major issues to consider are with either of those designs?

When extremists win primaries

Colonial origins


  • “What Happens When Extremists Win Primaries?” Hall (2015)
  • Summary:

    • Observational study on the short-/long-term effects of having extrimists run in general elections
    • Use primary to find comparable cases of when extrimists win and just lose
    • Having extrimists selected in primaries leads to lower chance of winning and lower representation

Theory



  • Question: Is having more ideologically extreme candidates good or bad for the party/voters?

  • Hypothesis: More extreme party candidate today \(\Rightarrow\) lower chances of winning today \(\Rightarrow\) lower representation today / lower future incumbency advantage

  • Searching for as-if random assignment:

    • We cannot just compare general election results since there is selection bias
    • But how do extremists get to general elections? Through primaries
    • Let’s look at when extremists win primaries and try to find “coin-flip” situation

Making discontinuity


  • We need to look at the elections that were almost won and just won by extremists \(\Rightarrow\) Extremists vote share in primaries is forcing variable

  • Two questions:

    1. How do we define extremists?
    2. What are the primaries where both extremists and moderates could win?

Analysis


  • They estimate the following regression:

\[ \color{#98971a}{Y_{ipt}} = \beta_{0} + \color{#458588}{\beta_{1}} \color{#d65d0e}{\text{Extremist Primary Win}_{ipt}} + f(V_{ipt}) + \epsilon_{ipt} \]

  • \(i\) is usually unit within one time period, \(t\) is usually time, \(p\) - party

  • Outcome of interest: vote share/victory/roll-call voting score

  • Treatment indicator: whether extremist wins contested primaries (vs moderate) and proceeds to national elections

  • RDD treatment estimate

  • What is \(f(V_{ipt})\)?

(Many) results


Discussion



  • What are possible issues here?

    • Excludability: Is there sorting around the cut-off?
    • Selection bias: Are extremists the same as moderates on anything besides ideology? Are distrcits with contested primaries different from others?
    • Measurement error: Lots of possibly imprecise measures (including gender ?!)…
  • Still better than just a correlation analysis
  • Great visual representation of results!!

Effects of S*n in UK

Effects of S*n in UK


  • “Tabloid Media Campaigns and Public Opinion: Quasi-Experimental Evidence on Euroscepticism in England” Foos and Bischof (2022)
  • Summary:

    • Observational study on the long-term effects of exposure to tabloid media
    • Media consumption is not random, so they utilize local boycott of tabloid newspaper in Merseyside after the Hillsborough disaster
    • Show that decrease in exposure to tabloid media lead to large long-term decrease in support for leaving EU

Theory



  • Question: What are the long-term consequences of exposure to tabloid media?

  • Hypothesis: Boycott of tabloid media \(\Rightarrow\) less exposure to coverage of this tabloid media by youth and working class \(\Rightarrow\) less support for leaving EU \(\Rightarrow\) Less voting to leave EU during 2016 Referendum

  • Searching for as-if random assignment:

    • Media consumption does not change drastically
    • We need to find a “shock” to media consumption/availability \(\Rightarrow\) Boycott
    • Since Boycott is localized we can compare outcomes in areas with Boycott to those without

Looking at differences in differences


  • While Boycott is a shock to media consumption it is not random and is compound treatment (affects many things at once)

  • Two questions:

    1. How do we show that it was a shock and where?
    2. How do we approximate what the outcomes would be, would there not be a shock? Parallel trends!

Analysis


  • They estimate the following regression:

\[ \color{#98971a}{\text{leavingEU}_{i,c,t}} = \alpha_{c} + \gamma_{t} + \color{#458588}{\delta_{\text{DID}}} \color{#d65d0e}{T_{c,t}} + \varepsilon_{i,c,t} \]

  • \(i\) as before is unit (respondent) in one time period, \(t\) is as before time, \(c\) - constituency

  • Outcome of interest: answer to question about support for leaving EU

  • Treatment indicator: whether respondent resides in constituency in Merseyside after the Boycott start

  • DiD treatment estimate

  • What is \(\alpha_{c}\) and \(\gamma_{t}\)?

(Many) results


Discussion



  • What are possible issues here?

    • Compliance: Is it the case that Merseyside actually implemented Boycott?
    • Excludability: Boycott likely affected many things at the same time, do they answer this concern well?
    • Non-interference: Could Boycott in Merseyside affect exposure to Sun or its coverage in other places?
  • Still better than just a correlation analysis

Conclusion on observational studies


  • With observational designs we are trying to approximate experiments to avoid confounding
  • There is an ordering of designs in terms of how persuasively they avoid confounding:

    • Pure correlation \(<\) Controlling for possible confounders \(<\) Instrumental variables \(<\) RDD/DiD \(<\) Natural experiments \(<\) Randomized experiments
  • The order is often reversed if we are concerned about naturalistic setting/long-term effects/Logistical costs:

    • Randomized experiments \(<\) Natural experiments \(<\) RDD/DiD/Instrumental variables \(<\) Controlling for possible confounders \(<\) Pure correlation
  • This is the main trade-off (and battle) faced by researchers conducting large-\(N\) studies

Next week: Ethical concerns and small-\(N\) studies

References

Acemoglu, Daron, Simon Johnson, and James A. Robinson. 2001. “The Colonial Origins of Comparative Development: An Empirical Investigation.” American Economic Review 91 (5): 13691401.
Dunning, Thad. 2012. Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge University Press.
Foos, Florian, and Daniel Bischof. 2022. “Tabloid Media Campaigns and Public Opinion: Quasi-Experimental Evidence on Euroscepticism in England.” American Political Science Review 116 (1): 1937.
Hall, Andrew B. 2015. “What Happens When Extremists Win Primaries?” American Political Science Review 109 (1): 18–42. https://doi.org/10.1017/S0003055414000641.